Artificial Intelligence for Humans, Volume 1: Fundamental Algorithms by Heaton Jeff

Artificial Intelligence for Humans, Volume 1: Fundamental Algorithms by Heaton Jeff

Author:Heaton, Jeff [Heaton, Jeff]
Language: eng
Format: mobi, epub
Published: 2013-12-11T16:00:00+00:00


It will take some explanation to show what Figure 5.1 depicts, as there is quite a bit going on in this graph. The clusters are shown by color. They are also circled, for the benefit of those viewing this in black and white media. You can see that there are three clusters: red, green, and blue.

Each point is designated one of three total characters, which are pluses, asterisks, and O’s. The character type indicates the species, as specified by the file in Listing 5.1. If the clustering algorithm were able to correctly cluster the iris species, each character of the same type would be the same color. If you look at Figure 5.1, you will see that this is not the case.

While you can see that the clusters are clearly defined, they do not line up with the species. This is mostly unavoidable. K-Means will sometimes come closer to picking the species, due to its random nature. However, if you look at Figure 5.1, you will see that two regions are linearly separable. This means you could draw a line between them. However, two of the iris species are not linearly separable. There is overlap. It would be impossible for unsupervised clustering alone to find the separation between these two species.

You may be wondering how the four dimensional iris vectors were drawn on a two dimensional graph. I used R to reduce the dimensions to two for graphing purposes. Dimension reduction is a common technique for data visualization, and was done using R’s cmdscale function.

Supervised Training

Supervised training is more restricted than unsupervised. A supervised training set consists of pairs of input and ideal output data. For the iris data set, you would input the four ratio measurement observations as a four dimension input vector. You would likely use one-of-n encoding to encode the species data into an ideal output vector. The machine-learning algorithm would be rated on how well it produced the expected output vector, given the input vector. We will use supervised training with the iris data later in this book.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.